Correlations in the Data

Planting for Resilience

Last Updated On:  2024-02-09 07:38:40.272494
Author: Philipe Bujold

Correlations in the Data#

We used phi-k correlation to explore to explore and quantify the strength and nature of relationships between various categorical variables within the dataset. This advanced statistical method extends beyond traditional correlation measures by accommodating both ordinal and nominal variables, making it particularly suited for analyzing survey data that often comprises a mix of question types and response options.

The primary rationale for applying phi-k correlation in this context was to uncover links between variables that are not predicated on numbers. That is, that categories of responses could be treated as such and not be converted into ordinal values. In this way, we could correlate survey responses—ranging from demographic information to perceptions about vetiver and longer-form answers .

Understanding these correlations are important as they can highlight:

  1. Patterns in the data that we don’t expect should be there.

  2. Linkages between variables that should eb accounted for during intervention design and implementation.

  3. Lets us better see patterns in the data that go beyond respondent clustering

# quant_cols = mult_cols + quant_cols + ['Behavioral cluster']
quant_cols = quant_cols + ['Behavioral cluster']
interval columns not set, guessing: ['latitude', 'longitude', 'altitude', 'hh_mems_elig', 'HR3_age', 'Q6_How_many_people_live_in_yo', 'Q10_How_many_years_ived_in_this_village', 'Q32_Now_think_abou_ity_decided_to_do_so', 'Q33_Again_think_ab_ity_decided_to_do_so']

Things to Note#

  • phi-k values range from -1 to 1

  • phi-k meaning

    • Small or Weak Association: phi-k between 0.1 and 0.3.

    • Moderate Association: phi-k between 0.3 and 0.5.

    • Strong Association: phi-k greater than 0.5.

Below you can see some of the more “unexpected” correlations, plotted to better understand how these main variables (title at bottome of figure) may influence other variables.

Village
username
Behavioral cluster

What this means:#

If we consider that username (i.e. the enumerator), Village, and Behavioral cluster have clear correlations with other variables in the data. This is what it can mean for the aggregated 6 key avriables in which we are interested:

  1. Salience of Loss: How intensely do villagers feel about flood risks and the impact of erosion on their lives?

  2. Choice Uncertainty: What are the options for reducing flood risks?

  3. Outcome Efficacy: Does using vetiver grass work?

  4. Collective Efficacy: Can we successfully use vetiver grass together, as a community?

  5. Self Efficacy: Can I successfully plant and maintain vetiver grass?

  6. Material Access: Can we access the resources needed to use vetiver?

fig_list = survey.create_combined_heatmap_charts(labelled_df, ['Salience of loss', 'Choice uncertainty', 'Outcome efficacy', 'Collective efficacy', 'Self efficacy', 'Material access'], question_type, ['username'], labels, title="Combined Radar Plot")